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ABSTRACT 

This paper proposes a new class of online policies for schedul- 
ing in input-buffered crossbar switches. For a system with 
arrivals, our policies achieve the optimal throughput, with 
very weak assumptions on the arrival process. For a system 
without arrivals, our policies drain all packets in the system 
in the minimal amount of time (providing an online alterna- 
tive to the batch approach based on Birkhoff-VonNeumann 
decompositions) . Policies in our class are not constrained to 
be work conserving in every time slot; it may be possible to 
add edges to the schedule. 

Most algorithms for switch scheduling take an edge based ap- 
proach; in contrast, we focus on scheduling (a large enough 
set of) the most congested ports. This alternate approach 
allows for lower-complexity algorithms, and also requires 
a non-standard technique to prove throughput-optimality. 
One algorithm in our class, Maximum Vertex- weighted Match- 
ing (MVM) has worst-case complexity similar to Max-size 
Matching, and in simulations shows better delay perfor- 
mance than Max-(edge)weighted-Matching (MWM). 

1. INTRODUCTION 

A commonly used switching fabric in high speed packet 
switches (e.g., Internet routers) is a crossbar with input 
queues (IQ) to hold packets during times of congestion. An 
Ni x N2 input-buffered crossbar switch contains Ni input 
ports and N2 output ports. The crossbar is constrained to 
schedule a matching i.e., it can send at most one packet from 
any input port, and receive at most one packet at any output 
port in a single time slot. The switch scheduling problem 
is to determine which matching is to be used in every time 
slot. 

Most algorithms on switch scheduling take an edge based ap- 
proach, attempting to schedule either a maximal/maximum 
set of edges, or those with the largest queues. In this paper 
we design policies that look only at the weight of the ports 
in the switch; queues on the individual edges matter only to 





Critical Port Policies 




/ ( LHPF_\^' 






\ ( / mvm) 






\ \ " / 


y Wsm 


mwm\ \/ 




Throughput Optimal Policies 




Figure 1: Class of scheduling policies for Switches 



the extent that they are non-zero. Intuitively, our policies 
ensure scheduling of a large enough set of heavy ports in the 
system. By looking at port weights, we are able to char- 
acterize a new class of policies that have lower worst case 
complexities and are potentially simpler to implement. 

To analyze our algorithms, we use a node-based analysis 
technique. We show that our class of policies is throughput 
optimal, i.e., they result in stable queues at all admissible 
loads. We prove throughput optimality using a novel, non- 
standard Lyapunov function: the maximum total queue at 
any port. In addition, our policies also achieve minimum 
clearance time, i.e., given an initial loading on the switch 
and no further arrivals, they remove all the packets in the 
minimum possible time. These policies do not require a 
priori knowledge of arrival rates. 

1.1 Main Results 

The focus of this paper is on the design and analysis of poli- 
cies that determine schedules based upon the total queues 
at the nodes/ports of the switch. We will construct a class 
of such policies that are both throughput and clearance- 
time optimal. Throughout this paper, we will use "node" 
and "port" interchangeably. We will also use "weight" and 
"queue" interchangeably; they refer to the cumulative queue 
at the node. For an input port, the queue is the total num- 
ber of packets waiting to be transferred from the port; for an 
output port the queue is the number of packets waiting to 
be transferred to the port. Finally, a matching M is said to 
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Figure 2: Different types of matchings for a switch. 



match a node i if it contains some edge touching i. We now 
describe the classes of node- weighted policies we investigate. 

Critical Port: Given a node-weighted bipartite graph, a 
port i is critical if its weight is no smaller than any other 
port. A matching M is a critical port matching if it matches 
every critical port. A scheduling policy is a critical port 
policy if it produces a critical matching in every time slot. 

Maximum Vertex- weighted Matching (MVM): A match- 
ing M is an MVM if the total weight of the nodes it matches 
is higher than (or equal to) the total weight of nodes matched 
by any other matching M. The MVM scheduling policy is 
one which schedules an MVM in every time slot. 

Lazy Heaviest Port First (LHPF): The threshold l(M) 
of a matching M is the lowest positive integer such that M 
matches all ports with weight greater than or equal to l(M). 
So, for example, a perfect matching has threshold 1. M is 
an LHPF matching if it has the lowest threshold among all 
possible matchings. We call this the optimal threshold. An 
LHPF policy is one that produces an LHPF matching in 
every time slot. 

The main result of this paper is that any LHPF policy is 
throughput-optimal (Thm. [1]). The proof of this result uses 
a novel Lyapunov function: the weight of the heaviest port. 
We also show that a policy is clearance time optimal iff it 
is a critical port policy (Prop. [TJ. Given any queue con- 
figuration, a critical port matching always exists; we also 
provide a simple way to find it. This enables us to de- 
velop a "slot-by-slot" algorithm for the clearance problem, 
as opposed to existing batch policies 10, 20, 31] based on 
Birkhoff-VonNeumann decompositions. 

We call our class "lazy" because a LHPF matching may not 
even be maximal; in particular, it may not match any extra 
nodes below the optimal threshold (beyond what it needs to 
satisfy those above the threshold). To clarify our classes of 
policies we give a simple example in Fig. [2] 



Consider a 4x4 IQ switch with edge weights and correspond- 
ing port weights as shown in the Fig. [2] There is only one 
critical port (port a) with weight 10. So a critical port policy 
must at least schedule port a. Now let us consider a LHPF 
matching. It is clear that the size of a matching can at most 
be three. It follows that not all the ports on the output 
side can be matched, in particular the threshold must be 
strictly greater than 3. Hence any matching that at least 
schedules ports a, i and j is LHPF. For example II, III and 
IV are LHPF matchings. There is a unique MWM (III) in 
the graph. There are many MVMs in this graph. Ill and IV 
are both MVM. 

Fig. [1] shows how the different policy classes relate to each 
other. It is well known that MWM [U [2H1 [23 [301 ES] and 
MVM ,19, are throughput optimal. It is also known that 
the MVM policy is a maximum-size matching policy [19] . 
but that not all MSM policies are throughput optimal [181 
115] . For the policy classes defined in this paper, critical 
policies need not be throughput optimal. Lemma 2] shows 
that any LHPF policy is also a critical port policy. Theo- 
rem \T\ shows that any LHPF policy is throughput optimal. 
Corollary [2] shows that any MVM is an LHPF matching. In 
Section 13.11 we provide an example to show that the clear- 
ance time of popular existing policies (like MWM, MSM 
and Greedy weighted maximal matching(GMM)) may be as 
large as twice the optimal. 

We now discuss some implications of our work from an al- 
gorithmic perspective. We prove simple properties of LHPF 
policies which can be used as a source for algorithms to find 
LHPF matchings in node-weighted graphs. This is similar 
to augmenting-path characterizations which provide algo- 
rithms for edge- weighted matching problems (like maximum 
cardinality, maximum-edge-weight etc.). We provide a way 
to modify simple but non-throughput-optimal policies (like 
edge-based greedy, or maximal matching) into throughput 
optimal ones via post-processing. We elaborate on this pro- 
cedure in Section U 

The tradeoff between delay and implementation complex- 
ity has been studied in [211 1201 119] , In general, a lower 
complexity scheduler will result in higher delays. There are 
simple algorithms like maximal matching and GMM, which 
empirically perform well in most cases [111 1121 [25] . but are 
difficult to analyze. In fact they are not even throughput op- 
timal in some cases. When used in conjunction with LHPF 
(via post-processing) , they should have both good delay and 
throughput. Note that one of the members of LHPF class 
is the MVM algorithm which can be shown empirically to 
have delay performance very close to the well known delay- 
optimal MWM-a [TU [28] [57] policies at a lower complex- 
ity PI] of 0(N 2 5 ) as compared to 0{N 3 ) for MWM [13] . 
LHPF class contains policies which are simpler to implement 
than the MVM and hence are potential candidate for a low 
complexity delay efficient scheduler with theoretical guar- 
antees on the throughput. Additionally, these policies are 
clearance-time optimal. 

1.2 Related Work 

Throughput optimal policies can be classified broadly into 
Backlog-aware policies, which require the knowledge of the 
backlogs at every time slot and those which are Backlog 



independent. 

Backlog independent policies instead use the knowledge of 
the arrival rates [TJ [3] to construct a randomized or periodic 
scheduling rule precisely matched for the input rates. Such 
scheduling offers arbitrarily low per-time slot computation 
complexity at the cost of large delays (shown to be O(N) 
[20] . where N is the size of switch). 

Backlog aware policies can be further classified into those 
which are frame based or batch based and the online policies. 

The frame based policies are considered in [311 10 20] and 
are based on the principle of iteratively clearing the back- 
log in minimum time. The throughput optimality of these 
policies is restricted to Bernoulli i.i.d. traffic in 31, ID] , In 
[20] . prior knowledge of the statistics of arrival process is re- 
quired to be able to select the frame size appropriately so as 
to achieve throughput optimality. Minimum clearance time 
policies have been applied to stabilize networks in [231 122] . 
Batch based policies [ID] are similar to the frame based poli- 
cies except that the frame size is dependent on the traffic 
arrival pattern and the scheduling algorithm used. 

In this paper we restrict our attention to the development 
of online algorithms, which attempt to schedule traffic by 
computing a matching every time slot. One such policy 
is the famous MWM policy which computes the maximum 
weight matching and is known to be throughput optimal. 
The proof for stability can be provided either in the fluid 
limits 4 or in the stochastic sense 29 . But essentially it 
hinges on a quadratic Lyapunov function and ensuring that 
the drift is negative. 

The Maximum Size Matching (MSM) policy schedules the 
maximum size matching and hence maximizes the instanta- 
neous throughput in each time slot. However it is known 
that if ties are broken randomly, MSM does not achieve 
100% throughput for all admissible Bernoulli traffic patterns 
[181 1 1 5] . It is possible that if the ties are broken carefully, 
a special MSM might be stable. Among the class of MSM 
policies, there are two polices that have been proposed in the 
literature to be throughput optimal: MVM and MWM-0+. 
M VM is known to be throughput optimal [19| . The proof of 
throughput optimality in [19] uses the fact that a MVM on a 
graph G is a MWM on a graph G , where edge weights have 
been selected carefully. The technique to prove throughput 
optimality of MVM is essentially the same as that for MWM. 
The proof provided in this paper serves as a alternate, since 
MVM is a member of the LHPF class of policies. 
MWM-0+ : At each time slot, consider all matchings 
which have maximal size. Among these choose one which 
has maximum weight, with weight function log. Break ties 
arbitrarily. This is conjectured to be throughput optimal in 

It is useful to also consider online scheduling according to 
maximal matches, which are matchings where no new edges 
can be added without sharing a node with an already matched 
edge. Maximal matchings can be found with 0(N 2 ) oper- 
ations and the computation is easily parallelizable to O(N) 
complexity [31] . Greedy weighted maximal matching (GMM) 



is a scheduler that tries to schedule the heavy edges. The 
GMM policy has been analyzed for the general class of net- 
works with interference constraints [BJ where it is shown that 
they achieve full throughput in a network that satisfies the 
local pooling condition. In simple terms, the local pooling 
condition means that a vector A in the capacity region can- 
not dominate another vector fj, in the capacity region in all 
the coordinates. This result can be generalized [111 1121 [2] to 
show that GMM achieves at least a certain fraction of the 
capacity region given by the local pooling factor. Although 
our Lyapunov function looks similar to that in [6] 1111 1121 [2] 
it is based on node weights as opposed to weights on the in- 
dividual edges in the graph. Moreover, we can show that the 
LHPF class of policies are not even required to be maximal 
in every time-slot whereas the policies considered in [TT] [T2] 
E] are. 

The general research on the delay analysis of scheduling poli- 
cies has progressed in the following main directions: 

• Heavy traffic regime using fluid models: Fluid models 
have typically been used to either establish stability 
of the system or to study the workload process in the 
heavy traffic regime. It has been shown in [28] that 
the MWM policy minimizes the workload process for 
a generalized switch. Furthermore, [57] proves multi- 
plicative state space collapse of a family of scheduling 
algorithms related to MWM in the heavy traffic regime 
and conjectures an optimal algorithm MWM-0+. 

• Stochastic Bounds using Lyapunov drifts: This method 
is developed in [161 [8] [H] and is used to derive upper 
bounds on the average queue length for these systems. 
However, these results are order results and provide 
only a limited characterization of the delay of the sys- 
tem. For example, it has been shown in [20] that the 
bounds in 16 , 26] are O(N) bounds and hence not very 
useful. It is also shown that it is possible to achieve 
0(log N) delay. 

As noted in [19], the MVM policy combines the benefit a 
maximum size algorithm, with those of a maximum weight 
algorithm, while lending itself to simple implementation in 
hardware. In MVM, each weight is a function of queue 
lengths (sum of all edges that touch a node) and hence it has 
an advantage of both the maximum size matchings with high 
instantaneous throughput while guaranteeing high through- 
put, even when the arrival traffic is non-uniform. We have 
in fact characterized a class of policies much larger than the 
MVM policy and potentially lower complexity and equiva- 
lent performance benefits. 

2. PRELIMINARIES 

Switches: This paper is about scheduling in (the standard) 
input-buffered crossbar switches, which we now briefly de- 
scribe. An Ni x Na input-buffered crossbar switch contains 
Ni input ports and N2 output ports. The system operates 
in discrete time slots. In each slot, packets may arrive at the 
input ports; each packet has an output port it needs to be 
transferred to. Packets have to be transferred from inputs 
to outputs, under the following constraint: in any one time 
slot each input port can send at most one packet to at most 



one output port, and each output port can receive at most 
one packet from at most one input port. The scheduling 
problem is to determine how to transfer packets subject to 
these constraints. 

Notation: Switch scheduling can be modeled as the prob- 
lem of finding matchings in bipartite graphs, one in every 
time slot. Consider G(s) the graph at slot s. G(s) is a bi- 
partite graph with input ports on one side and output ports 
on the other. As mentioned in the introduction, we will use 
"nodes" and "ports" interchangeably. There is an edge 
in G(s) if and only if there is at least one packet at input 
i that has output j as its destination. The scheduling algo- 
rithm finds a matching M(s) in G(s); then, for every edge 
£ M(s) one packet is then transferred from i to j. 
These packets are then considered to have left the system. 
A scheduling policy is a rule to pick the matching M(s), in 
every slot s, based on the state of the system. For any input 
port i, qi{s) denotes the total number of packets at i. Simi- 
larly, for any output port j, qj(s) denotes the total number 
of packets in the system (i.e. all inputs) that are waiting to 
be transferred to j. We will not need to refer to the queues 
on individual edges. We will however often refer to the total 
queue at a port as the "weight" of that port; "heavy" ports 
have more packets in their queues than "lighter" ports. 

We now state a couple of well-known results, from [171 1241 
[9] which we will use in the proofs of this paper. 

Lemma 1 (Hall's Condition). Let G be any bipartite 
graph, with the two partitions being Vi and V%. Let Si C Vi 
be any subset of one partition. Then, there exists a matching 
in G that matches every node in Si if and only if for every 
further subset S C Si, we have that \Af(S)\ > \S\. Here the 
neighborhood Af(S) is all nodes in V% that have an edge to 
some node in S. 



Lemma 2. Let G be any bipartite graph, with the two par- 
titions being Vi and Vi. Let Si C Vi, and suppose there ex- 
ists a matching Mi that matches all nodes in Si . Similarly, 
let S2 C V2 and there exist and M2 that matches all nodes in 
S2. Then there exists a matching M that matches all nodes 
in both Si and Si- 



Note that in Lemma [5] M may not match the nodes in the 
two sets to each other; just that each node in Si U S2 will 
be matched to some node in the graph. 

Graph-theoretic preliminaries: 

We now formally define the terms we will use. All are stan- 
dard, except for the definition of "absorbing paths". Through- 
out, we consider a node- weighted graph. The length of a 
path is the number of edges it contains. The weight w(M) 
of a matching M is the total weight of all the nodes it 
matches. For any two matchings Mi and M2, the symmet- 
ric difference, denoted by M1AM2, is the set of edges in 
one of the two matchings, but not in both. It is well known 
that M1AM2 is always the node-disjoint union of paths and 
even-length cycles. Finally, given a matching M and path 
P, the set M © P = M - (M n P) + (M c n P) denotes the 




Figure 3: Augmenting and absorbing paths. Edge 
(b, i) is in the matching M. a — i — b is an absorbing 
path, a — i — b — j is an augmenting path. 

edges obtained by "flipping" the edges in P. We now de- 
fine the two scenarios of our interest where the resulting set 
M © P is also a matching. 

Given a matching M, and any node i not matched by M, 

1. An augmenting path from i is any odd-length path P 
whose every alternate edge is in M, has i as one end- 
point, and ends at an unmatched node (say j). 

Note that now M®P matches every node M does, and 
in addition matches i and j as well. Thus its weight is 
w(M (BP) — w(M) + Wi + Wj , which is strictly bigger 
than w(M). 

2. An absorbing path from i is any even-length path P 
whose every alternate edge is in M, has i as one end- 
point, and whose last endpoint - say j - has weight 

Wj < Wi. 

Note that now M © P matches every node M does 
except j, which is replaced by i. Thus it has strictly 
higher weight: w{M © P) — w(M) +w t - Wj > w(M) 

Fig.|3]illustrates the idea of augmenting and absorbing paths. 
a — i — b is an absorbing path from o since it is an even-length 
path ending in a node with smaller weight, a — i — b — j is 
an augmenting path from a since it is a odd-length path and 
ends in an unmatched node j. 

3. CLEARANCE TIME AND CRITICAL PORT 
POLICIES 

In the clearance time problem, the queues in the system 
have an initial loading, and there are no arrivals. We are 
interested in scheduling so as to minimize the clearance time, 
which is the time before every packet in the initial loading 
has exited the system. In the following, qi(s) denotes the 
remaining packets at port i immediately after time slot s, 
and qi(0) the initial loading. 

Since at most one packet can be scheduled at any given port, 
an obvious lower bound on the clearance time is 

r > maxgi(O) (1) 

i 

It is known that this lower bound is tight, based on the 
following "batch" policy. We first briefly describe this policy, 
and then describe a more elegant slot-by-slot policy. Let 
r* = max; qi(0). 



Batch policy: This is based on the the Birkhoff-VonNeumann 
theorem. Let TV = max{TVi , TV2}, and consider the TV x TV 
matrix L in which, for i < TVi and j < TV2, has entries 
L(i,j) — g *^°) , and qij(Q) is the number of packets wait- 
ing at input i for output j in the initial loading. All the 
other entries of L, i.e. all L(i,j) for which either i > TVi or 
j > N2, are 0. It is clear that L is a sub-stochastic matrix 
(i.e., the sum of every row and every column is less than or 
equal to 1). The Birkhoff-VonNeumann theorem says that 
any such matrix can be represented as a convex combina- 
tion of (sub) permutation matrices; each (sub)permutation 
matrix corresponds to a matching in the switch. Further- 
more, the fact that every entry of L is an integer multiple of 
■pr implies that a batch of at most q* such matchings will be 
needed. Thus the lower bound is tight, and can be achieved 
by this batch of matchings [3T1 ITOl |20] . 

The Birkhoff-VonNeumann approach above gives us an algo- 
rithm for clearing out a given batch of packets, but it would 
be more practical to have a "slot-by-slot" solution: one in 
which the matching at each time can be easily determined 
from the current loading. We now show that the class of crit- 
ical port policies is exactly what is needed for a slot-by-slot 
solution. 



Proposition 1. A scheduling policy is clearance-time op- 
timal, i. e. it achieves the lower bound (J^), if and only if it 
is a critical port policy. 

Proof: Suppose n is a clearance-time optimal policy. This 
means that at any time slot s < r* , every port i has <ft(s) < 
r* — s; otherwise, the port cannot be emptied by time slot r* . 
Also, it is clear that all the ports with initial load qi(0) = t* 
will now have qi(s) — t* — s; thus the weight of the critical 
ports at time slot s is r* — s. If any one of these critical 
ports is excluded by ir in slot s, it will have a total queue 
of r* — s at time slot s + 1, and hence cannot be drained 
by time r* . Thus every clearance-time optimal policy is a 
critical port policy. 

Conversely, suppose now that n is a critical-port policy. It 
is easy to see that in any time slot the maximum load at any 
port will decrease by exactly one. This is because the ports 
with the maximum loads are the critical ports, and every 
one of them will be scheduled by 7r in slot s. ■ 

Corollary 1. Given any set of queues, there exists a 
critical-port matching. 



Procedure for Critical Port Matching 

INPUT: A node-weighted graph, and any initial matching 

Mo (which could be empty) 

OUTPUT: M* , a critical-port matching 



• Set 1=1 

• While there exists critical port i not matched by M;_i, 

— Find P, an augmenting path or absorbing path 
from i with respect to Mi-i. 

- Set Mi = M ( _i © P and increment 1 = 1 + 1 



Lemma 3. Given any matching M, and a critical port 
i not matched by M , there exists an augmenting path or 
alternating path P from i. 

Proof: By Corollary [1] there exists a matching M* that 
matches all critical ports. In particular, it matches i. Con- 
sider now the symmetric difference MAM* , which contains 
node-disjoint paths and cycles; since i is not matched by M, 
i will be the endpoint of a path P in MAM*. If P is of 
odd length, it is an augmenting path, and we are done. If 
P is even length, let j be the other endpoint of P. Now, 
P begins at i with an edge in M* , so it ends in j with an 
edge in M. Also, there is no edge in M* touching j, because 
j is the endpoint in the symmetric difference. This means 
that j cannot be a critical port, because M* matches every 
critical port. Since i is critical, this means that u>i > Wj, 
which means that P is an absorbing path. ■ 

Correctness of Procedure: Suppose at iteration I, we 
have that M;_i does not match critical port i. Lemma [3] 
guarantees that an augmenting or absorbing path P from i 
will be found. Also, if Mi = P©M;_i then i will be matched 
by Mi. Thus all we need to show is that any critical port 
that is matched by M;_i remains matched by Mi. This is so 
because: if P is augmenting, Mi matches all nodes matched 
by M1-1. If P is absorbing, the node j removed at the 
expense of i is not critical, because absorbing requires that 
uij < Wi. Thus the procedure gives us the desired critical 
port matching. 

3.1 Clearance-time of other Policies 

We now provide an example to show that edge weight based 
policies like MWM, Greedy weighted maximum matching 
(GMM) and MSM are not clearance time optimal. 



Proof: Given the set of queues, consider the clearance time 
problem with these queues as the initial loading. We know 
that there exists a policy, e.g., based on the Birkhoff-VonNeumann 
decomposition, that achieves the bound ([1]). By Lemma [T] 
this policy has to be a critical-port policy. Hence, in the first 
time slot it will have a critical-port matching. This implies 
such a matching exists for our set of queues. ■ 

We now give a procedure to find a critical-port matching, 
given any set of queues. 



Consider a N x N switch with the following configuration. 
Input Port 1 has one packet each destined for ports 1 through 
TV — 1. Port i, 2 > i > TV have TV — 1 packets destined for 
output port i — 1. The clearance time t* for the above con- 
figuration is TV. 

Let us consider, how MWM schedules packets in the given 
system. In the given system, at any time, no more than TV— 1 
input ports can be matched under the switch constraints. 
The maximum weight matching policy does not match input 
port 1 for the first TV — 2 slots since for any output port j, 




Figure 4: An example where MWM, GMM, MSM 
and MWM-0+ are not clearance-time optimal 

the edge weight qij is smaller than q(j +1 )j. So, after N — 2 
slots, the weight of input port 1 is N — 1. Depending on how 
ties are broken, MWM will take either N or N— 1 more slots 
to clear all the packets in the system. Hence, MWM clears 
the packets in at least 2N — 3 slots whereas the clearance 
time is N. This example shows that MWM can take twice 
as much time to clear the system for large N. The GMM, 
MWM-a and MWM-0+ policies will schedule this system in 
the same manner as discussed above and take at least 27V — 3 
slots to empty all the packets in the system. 

What is attractive is that LHPF policies being critical port 
policies are also clearance-time optimal, which we prove in 
the next lemma. 

Lemma 4. Any LHPF matching is also a Critical port 
matching, and hence any LHPF policy is also a Critical port 
policy. 

Proof: By Corollary [1] for any set of queues, there exists a 
critical port matching. Hence the optimal threshold (defined 
in Section fl.ll ) must be smaller than the r* , the weight of 
the critical port. Since the LHPF policy matches all ports 
above the optimal threshold, it will match all critical ports 
and hence is a Critical port policy. ■ 

4. LHPF POLICIES 

We now take a closer look at LHPF matchings, and LHPF 
policies. Recall the definition of threshold 1(.) from the in- 
troduction. LHPF matchings were defined in the introduc- 
tion as those in which all the ports whose weight is above 
the optimal threshold are matched. The optimal threshold 
is the lowest possible threshold for a graph. We now present 
a structural sufficient condition for a matching to be LHPF. 
This will enable us to both develop algorithms for LHPF 
matchings, and to understand their properties. 

Lemma 5. M is an LHPF if at least one of its heaviest 
unmatched nodes has no augmenting path or absorbing path. 

Remarks: Note that the condition just concerns one of 
the heaviest unmatched nodes; every other unmatched node 



(heaviest or otherwise) is free to have augmenting/alternating 
paths. This is a reflection of the fact that LHPF matchings 
need not even be maximal. 

Lemma[5]is a sufficient condition for a matching to be LHPF, 
but it is not necessary. This is because if the heaviest un- 
matched node is below the optimal threshold, then it is not 
required to be matched to be LHPF. For example, consider 
the graph in Fig. [2] Matching H, for example is a LHPF, 
although there is an augmenting path I — a — i — c which 
results in matching III, which is again LHPF. 

Proof: We will prove the contrapositive, i.e. we will prove 
that if M is not and LHPF then every heaviest unmatched 
node will have an augmenting or absorbing path. Let w be 
the weight of the heaviest node not matched by M, and let hi 
be the set of heaviest unmatched nodes (i.e., all unmatched 
nodes with weight w). 

Now, by assumption, M is not LHPF. Let M* be any LHPF 
matching. It follows that the threshold of M* is strictly 
lower than that of M, which can only happen if M* sched- 
ules all nodes of weight w, and in particular, all nodes in 
the set U. Consider now the symmetric difference MAM*, 
which contains node-disjoint paths and cycles. Every ! 6W 
is matched by M* but not by M. Thus each i £ U will be 
an endpoint of a path, say Pi, in MAM* . 

Consider any such i and Pi. If Pi is of odd length, it is an 
augmenting path. If Pi is of even length, let j be its other 
endpoint. Because P is even length, j is not matched by 
M* . This means that Wj < w — Wi, which implies that Pi 
is an absorbing path. ■ 

In the same way that augmenting-path characterizations 
provide algorithms for edge- weighted matching problems (like 
maximum cardinality, maximum-edge- weight etc.), Lemma 
[5]can be used as a source for algorithms to find LHPF match- 
ings in node-weighted graphs; it can also be used to modify 
a (potentially non-LHPF) matching to obtain an LHPF one. 
We now describe a simple procedure for either of these tasks. 

Procedure for LHPF 

INPUT: a node-weighted graph, and any initial matching 
Mo (which could be empty, or generated by some other al- 
gorithm) 

OUTPUT: M* , an LHPF matching 

• Set I = 1 

• At iteration I, 

— IF Mi-i matches all nodes, set M* = M;_i and 
BREAK. 

— Pick any highest unmatched node i in Mi-i, and 
try to find an augmenting or absorbing path P 
from i. 

- IF such a P can be found, set Mi = Mj_i © P 
and 1 = 1 + 1. 

- ELSE set M* = M t -i, and BREAK loop. 



Remarks: The above description is just a conceptual pro- 
cedure; efficient implementations could potentially rely on 
optimizations (e.g. like parallelism, as was done in |13| for 
max-cardinality matching). We emphasize, rather, a more 
interesting aspect of the above procedure: is that it allows 
us to make LHPF matchings out of non-LHPF ones via post- 
processing. In particular, given a matching Mo, by a simple 
policy, one can go down the sequence of ports, and add them 
to the current matching: either via augmentations, or at the 
expense of some node with strictly lower weight. The pro- 
cedure stops at the point of first failure to find P; it can be 
expected, on average, that if the initial matching is maxi- 
mal (say for example if it is the greedy matching), then the 
number of nodes v that need to be inspected may be small. 
As mentioned in Section [1.1 1 this could be of advantage in 
several settings. 

This procedure can also be used as a pre-processing step be- 
ginning with an empty matching to give a LHPF matching 
with possibly many unmatched nodes (those which are be- 
low the optimal threshold). This matching can be extended 
by a low complexity scheduler, for example to a maximal 
matching to improve the delay performance. 

Correctness of Procedure: The correctness of the above 
sequential adding procedure follows from Lemma [3] In par- 
ticular, if there are no unmatched nodes, then the match- 
ing is clearly LHPF. If the procedure stops at iteration i, 
it means that for the matching M»_i, and a heaviest un- 
matched node v, there is no augmenting or absorbing path 
from v. This is exactly the condition under which Lemma [5j 
guarantees that the matching is LHPF. 

For clarity, we now describe a policy that is not an LHPF 
policy. Suppose we do the following: go down the sequence 
of ports, recursively matching nodes if any neighbor is free, 
but not changing the edges already previously matched. Even 
if we go all the way to the end, this policy is not LHPF be- 
cause it may exclude a port that would have been possible 
to schedule by changing the matchings of heavier ports that 
came before it. It is thus important that the ports are added 
via augmenting or absorbing paths. 

One example of an LHPF matching that has been previously 
studied is Maximum Vertex-weighted Matching. 

Lemma 6. M is an MVM if and only if there is no aug- 
menting path or absorbing path from any of its unmatched 
nodes. 

Remarks: Contrast Lemma [B] to Lemma [31 MVM requires 
that all unmatched nodes have no augmenting or absorb- 
ing path. LHPF requires only that one of the heaviest un- 
matched ports satisfy this property. 

Proof: Let M be a MVM. If there exists some path P that 
is either an augmenting path or an absorbing path, then the 
matching M © P will have strictly higher weight then M, 
which is a contradiction. Thus no such P exists. 

Now suppose that M has no augmenting or alternating paths. 
Suppose also that it is not a MVM. Let M* be any MVM, 



and consider MAM*; it is a collection of node-disjoint cy- 
cles and paths. Consider any path P in this collection. If 
P is of odd length, it is either an augmenting path for M* , 
or for M. The latter possibility is ruled out by assumption, 
and we just proved that the former is ruled out too: M* 
is an MVM, and so cannot have an augmenting path. So 
there are no odd length paths. This means there has to be 
an even length path Pi in the collection whose endpoints 
have unequal weights; else the weights of M and M* will 
be equal. However, depending on which endpoint is heav- 
ier, this Pi is either an absorbing path for M or M*; again, 
either possibility is ruled out. ■ 

Corollary 2. Any MVM matching is an LHPF match- 
ing, and hence the MVM scheduling policy is an LHPF pol- 
icy. 

Proof: By Lemma [5J any MVM will not have an augmenting 
or absorbing path from any unmatched node; including any 
of its heaviest unmatched nodes. By Lemma [3J this implies 
that it is an LHPF matching. ■ 

The complexity of MVM is 0(iV 2 - 5 ) QH and the policy is 
simple to implement in hardware. Many heuristics have 
been developed for MSM and they can be readily tuned 
to compute approximate MVMs. with the characterization 
of LHPF policies, which is a much bigger class, we expect 
that it would be much easier to develop heuristics for LHPF 
matchings. MSM is a special case of MWM. The proof uses 
the fact that a MVM is a MWM on a graph where edge 
weight on an edge connecting input node i and output node 
j have been chosen as follows: 

,,, ( n \ if?ij(n)>0 , s 

w v (n) -| Q) otherwise (A) 

Lemma 7. If G has a perfect matching (i.e. one that 
matches every port), then any LHPF matching also has to 
be perfect. 

Proof: The existence of a perfect matching means that the 
optimal threshold for a matching is 1. Any non-perfect 
matching will have a higher threshold, and hence not be 
an LHPF. ■ 

5. THROUGHPUT OPTIMALITY OF LHPF 
POLICIES 

In this section we show that any LHPF policy is through- 
put optimal. Let the system be empty at time 0. Let ai(n) 
denote the cumulative number of packets that have arrived 
at an input port i up to time slot n. Similarly, o 3 (n) de- 
notes the cumulative number of packets that have arrived 
in the system, destined for output port j up to time slot 
n. For each edge in the matching, one packet is removed at 
both the nodes touching the edge. With this understanding, 
henceforth, we shall not distinguish between an output and 
input port. We assume the convention that ctj(Q) = 0. We 
assume that the arrival processes a»(.) satisfy a strong law 



of large numbers (SLLN): with probability one, 

di(n) 



lim 

n — >oo 



A; 



(3) 



For any port, input or output, let Ai be the average rate of 
arrival of packets to port i. Define 



e = mm 



(1-Ai) 



The capacity region is {A : A^ < 1 for all i}, which means 
that e* > 0. 

Fluid Model 

We develop a fluid limit model following the development 
in [4]. Let qi(n) denote the weight at port i and di(n) be 
the number of packets that departed from port i by time 
slot n. Let fiM(n) be the number of slots in which match- 
ing M £ A4 has been scheduled, where Ai is the set of all 
matchings (not necessarily maximal). Then h m is a non- 
decreasing function. Also note that by definition of G(n), 
M can schedule only non-zero edges in the system. Mi in- 
dicates if matching M schedules port i. Note that qi(.) and 
di(.) evolve according to the following: 



<?;(n) = 5i(0) + a,i(n) - di(n) 

n 



} J h M {n) = n 

We define a,i(t) for a non-negative real number t by interpo- 
lating the value of a; between time \t\ and \t\ +1. We also 
define qi(t) and di(t) in the same way by linear interpolation 
of the corresponding values at time \t\ and [t] + 1. Then, 
by using the techniques of Theorem 4.1 of [5], we can show 
that, for almost all sample paths and for all positive sequence 
x k — > oo, there exists a subsequence x kl with x k[ — ♦ oo such 
that the following convergence holds uniformly over compact 
intervals of time t: 



For all i, 



ai{x kl t) 
x kl t 
dj{x kl t) 
x k . 



-> Xit 
Di(t) 



qi{x kl t) 

x kl 
hj(x kl t) 



Qi(t) 

Hi(t) 



(4) 

(5) 
(6) 



The system (D, H, Q) is called the fluid limit and queues 
evolve in the fluid limit as follows: 

Qi(t) = Qi(0) + Xi(t) - Di(t) 



J2 H M {t) = t 

D, H and Q are absolutely continuous functions and are dif- 
ferentiable at almost all times t > (called regular times) . 
It follows that 



d_ 



i(t) = A, 



Tit 



A,:- MijH M {t) 
A/SA1 



(J) 



dt 



The following lemma from [3] establishes the connection be- 
tween the stability of the switch and the fluid model. 



Lemma 8. A switch operating under a matching algorithm 
is rate stable if the corresponding fluid model is weakly sta- 
ble. 



Lemma 9. The fluid model of a switch operating under a 
matching algorithm is weakly stable if for every fluid model 
solution D, T, Q with Q(0)=0, Q(t)=0 for almost allt > 0. 



Define Lyapunov function 

V(Q(t)) = max Qi(t) 

i 

Note that in the definition of V the maximum is taken over 
all ports, input and output. 

Remarks 



The Lyapunov function used by [6] for the analysis of 
GMM policy also looks at the maximum queue length. 
The novelty of our proof is that we do not need to 
look at the individual queue lengths. Our Lyapunov 
function is based on port weights. Another difference 
is that while the analysis in [5] depends on the fact 
that GMM is a maximal matching, our proof works 
for all LHPF policies which are not even required to 
be maximal, in general. 

Our proof of stability is more subtle than the proof 
of stability for the MWM policy [4 . Note that the 
maximum weight matching in the graph remains the 
maximum weight matching in the corresponding fluid 
model. However, the ports that are critical in a given 
interval of time (t,t + 5) in the fluid model may not 
be critical on a slot by slot basis in the actual system. 
Hence, for example, a critical port policy may not be 
able to schedule all the ports that are critical in the 
fluid model. 



Proof Intuition 

Our proof is based on the observation that all the ports 
that are critical (heaviest) in the fluid limit, may not remain 
heaviest in the neighborhood of time t, but they continue to 
be above a certain threshold. We show that the optimal 
threshold must be below this threshold and hence all ports 



that are critical in the fluid limit are scheduled in every time- 
slot around t. We prove in Lemma [lOl that a LHPF policy 
schedules all the ports that are critical in the corresponding 
fluid model and hence is throughput optimal. Note that the 
LHPF policy does not need to know which ports are critical 
in the fluid limit. 



Theorem 1. Any LHPF policy is throughput optimal. 

Proof: Since V(Q(t)) is a non-negative function, to show 
that V(t) — for almost all t > 0, it is enough to show that, 
if t is a regular time and V(t) > then V(Q(t)) decreases 
at least at a given rate. 

We prove that for all regular times t such that V(Q(t)) > 0, 
for a system operating under any LHPF policy, 

d . 



tit 



V(Q(t)) < 



Fix time t and let 7 = V(Q(t)) — max, Qi(t). Also, define 

C = {*:<&(*) = 7} 

to be the set of heaviest ports at t. Also, let 7 = max,^ Qi{t) 
be the heaviest of the remaining ports. Since the number of 
ports is finite, 7 < 7. Choose /3 small enough so that (a) 
7 < 7 - 3/3, and (b) (3 < Here N = max{iVi, N 2 }. 

Note that this implies that 



N + 1 
N 



(7-/3) > 7 + /3 



(8) 



Recall that Q(t) is absolutely continuous. This means that 
there exists a S small enough, so that at all times r £ (t, t+S) 
the queues satisfy the following conditions 

(CI) Qi(r) S (7- f,7+|) for alH <E C 
(C2) Qj(r) <7-^ for alii £C 

Let x k[ be a positive subsequence for which the convergence 
to the fluid limit holds. Consider I large enough so that 

l^-<W*)Kf. 

Consider time slots T := {\x kl t], \x k[ t] +1, . . . lx kl (t + 5)\}. 
The following lemma shows that all critical ports that are 
critical at the fixed time t in the fluid limit will be scheduled 
at all time slots n £ T. The conditions (CI) and (C2) can 
be rewritten as follows for the original switching system. 

(CI*) qi (n) SE [a*, (7-/?), a*, (7 + /?)] for all i 6 C 
(C2*) 5i(n) < 3^,(7 - 2/3) for all i £ C 

We state a lemma. We prove it immediately after the current 
proof. 



Now, assuming that a LHPF policy indeed schedules every 
port i g C at all times n g T, 

J2 M t (h M ([x kl (t+S)\)~h M (\x kl t])) = [x kl (t+S)\-\x kl t] 

(9) 

Now by dividing both sides by x k[ and let I — > 00, we obtain: 
T, M eM Mi ( hM ( x h( t + 5 )) ~ h M(x kl t) 

(10) 



1 > 



X k ,6 



> [x kl (t + 6)\ - \x kl t] ^ i 



x k ,s 



Hence for 5^0, 



V Mi-H M (t) = lim V M, 



H M (t + S)~H M (t) 



lim lim 

5->0 i^oo 



T,mgm Mi(h M (x kl (t + 5)) -h M (x kl t) 



x k ,{5) 



So, by Eq. © it follows that, Vz G C, 
dQi(t) 



(ft 



= -(1- A,) < -e*. 



by Eq. (JTUJ 

(11) 



(12) 



Also, every port i ^ C has weight strictly lower than every 
port in C, for the entire duration (t,t + 5). Thus it follows 
that 

|v«Ki» < 

This proves the theorem. ■ 
Proof of Lernma YWc 

Let Ci C C be the set of input ports in C, and C2 C C the 
set of output ports. We will first show that all ports in C\ 
can be matched, by showing that Hall's condition (given in 
Lemma [TJ holds for this set. By symmetry, all ports in C2 
can be matched and by Lemma [2] we conclude that all ports 
in C can be matched. 

Fix time n £ T, and for any subset S C Ci let AT T (S) be 
its neighborhood at time n. Suppose now that S fails Hall's 
condition, i.e. that |5| > ^^(5*)! + 1. Now, each i 6 S has 
qi(n) > x k[ (7 — /3), by condition (CI). This means that 

^>(n) > 151^,(7-/3) > (^(5)1 + 1)^,(7-/3) 

iGS 

Now, each packet in qi(n), i £ S is destined for one node in 
J\fn{S), which means that 

jeAT„(S) ies 



Lemma 10. For all times n € T , any LHPF policy will 
match all ports that are in C at time t in the fluid limit. 



(LHS and RHS may not be equal because there might be 
other input ports with packets for ports in Af n (S)). This 




Figure 5: Delay of a 8 x 8 switch under symmetric 
Bernoulli Traffic 



means that there exists one node j* £ Afn(S) with queue 
1 



qj*(n) > 



Af n {S)\ + l 



> 



) Xk, (7 - P) 



Further, \J\f„(S)\ < N, so we have that 

\Af„{S)\ + l N + l 
|7Vn(S)| - N 

which gives, from ||5). 

©.(n) > r^ija; fcl (7-/3)>x fci (7 + /3) 

However, this means that j* violates the fact, implied by 
(CI*) and (C2*), that qj (n) < x kl (j + (3) for all ports j. 
This is a contradiction, and thus it has to be that Hall's 
condition is satisfied at n. 

Thus, there exists a matching that matches all input ports in 
Ci- Similarly, it can be shown there exists a matching that 
matches all output ports in C2. By Lemma [2] this means 
there exists a matching that matches all ports in C. From 
conditions (CI*) and (C2*), it follows that this matching 
matches all ports with weight greater than 7-/3. So its 
threshold is 7 — j3, or lower. Now, by definition of LHPF, 
this means that the threshold of any LHPF matching cannot 
be greater than 7 — j3. This means the LHPF matching 
schedules all ports with weights above 7 — /3, i.e. all ports 
inC. ■ 

6. SIMULATIONS 

In this section we compare the delay performance of the 
MVM algorithm with MWM-a algorithms. MVM lies in 
the class of LHPF policies. We implemented a packet level 
simulator in Java. The simulations are run long enough so 
that the half-width of the 99% confidence interval is within 
1% of the mean. 



Figure 6: Delay of a 8 x 8 switch under symmetric 
Bursty Traffic 



We simulate a 8 x 8 switch with symmetric loading on each 
edge. We simulate two types of arrival processes, Bernoulli 
and a more bursty arrival process. Each arrival stream in- 
jects packets independently in the system. Clearly, these 
processes satisfy strong law of large numbers and the switch 
is guaranteed to be stable. The model of the bursty arrival 
process is described below. 

Bursty Arrival Processes: The arrival stream is a series 
of active and idle periods. During the active periods, the 
source injects one packet into the queue in every time slot. 
The length of the active periods (denoted by random vari- 
able a) axe distributed according the Zipf law with power 
exponent 1.25 and support [1,2,3,. . . ,100]. Heavy tailed dis- 
tributions like Zipf, have been found to model the Internet 
traffic 7 . During the active period the source generates one 
packet every time-slot. The idle periods are geometrically 
distributed with mean p. The mean arrival rate of a source 
can be controlled by changing the value of p. 

The results for Bernoulli traffic in Fig.[5]show that the delay 
of MVM policy is smaller than that of the MWM policy. 
MWM-a policies have been studied in the literature [141 127] 
and have been reported to incur smaller delay as the value of 
a goes to zero. Our simulations confirm this observation and 
also show that the delay performance of the MVM policy is 
no worse than the MWM-a policies even for small values of 



Fig. [6] shows the delay for the bursty arrival process de- 
scribed above. The delay is significantly higher for the more 
bursty arrival process as compared to Bernoulli traffic. It 
seems that although the MVM and MWM-0+ policies have 
different tie-breaking rule, their delay performance is actu- 
ally quite similar. 

7. DISCUSSION 

This paper proposes a new class of online policies called 
LHPF policies for scheduling in input-buffered crossbar switches. 
LHPF policies are both throughput optimal for a system 
with arrivals, and clearance-time optimal for a system with- 



out arrivals. To our knowledge, this is the first class of on- 
line policies that achieves both objectives. We also provide 
necessary and sufficient conditions for any policy to be clear- 
ance time optimal, and show that popular existing policies 
(like MWM, MSM and Greedy/GMM) can have clearance- 
time as large as twice the optimal. A particular policy in 
our class, MVM, has worst-case complexity similar to MSM 
(which is not known to be throughput-optimal) , and empir- 
ical delay performance better than MWM. 

As noted in [19], the MVM policy combines the benefit of a 
maximum size algorithm with those of a maximum weight 
algorithm, while lending itself to simple implementation in 
hardware. In MVM, each weight is a function of queue 
lengths (sum of all edges that touch a node) and hence has 
a advantage of both the maximum size matchings with high 
instantaneous throughput while guaranteeing high through- 
put, even when the arrival traffic is non-uniform. The LHPF 
class of policies do not care about the weight of edges as far 
as the required set of nodes above the optimal threshold have 
been scheduled. This reduces the computational overhead 
for the scheduler while maintaining the throughput guaran- 
tee. 

Philosophically, this paper departs from the prevalent edge- 
based approach to scheduling, as exemplified by MWM (sched- 
ule the heaviest queues), MSM (biggest number of queues) 
or Greedy. Instead we concentrate on the most congested 
ports. It would be interesting to see if the results of this pa- 
per generalize to other interference models (e.g. those that 
arise in wireless networks). In particular, ports in switches 
represent the scheduling constraints (at most one edge per 
port can be scheduled). More generally, we might concen- 
trate on the most-congested constraints, like e.g. cliques in 
the conflict-graph setting. For such a setting the Lyapunov 
function may be the heaviest constrained set. 

Our Lyapunov function is evocative of the one used by [6] for 
the analysis of Greedy (GMM). However, we emphasize that 
ours is a function of the total queues at ports, while theirs 
is of every single queue. Besides, the Lyapunov function of 
[6], other popular Lyapunov functions are also all based on 
individual queue lengths: sum of squares of queue lengths 
(for MWM) etc. 

In the fluid limit, [6] can guarantee that among the set of 
critical queues, a maximal set of queues can be scheduled 
at every time slot. The GMM policy is throughput opti- 
mal only when the underlying graph satisfies a local pooling 
condition. Note that the GMM policy is not throughput 
optimal for switches. In contrast, using the node based for- 
mulation, we have been able to prove that LHPF policies 
are throughput optimal because they guarantee that every 
port that is critical in the fluid limit can be scheduled at 
every time slot. This is because of the special structure 
(bipartite) graph of a switch. More generally, it has been 
shown that the GMM policy achieves at least a portion of 
the capacity region which is given by the local-pooling fac- 
tor [III 1121 [2]. It would be very interesting to see if this 
approach would lead to the development of simpler policies 
with throughput guarantees for more general class of net- 
works; especially since MWM matching problem although 
throughput optimal, has exponential complexity in the gen- 



eral setting. 
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